Sequencing and Raw Sequence Data Quality Control ◾ 23
acts as a table of content. Clicking a Summary item will take you to that item graph. In the
following, we will discuss each item that may be included on the QC report.
1.5.1 Basic Statistics
Basic Statistics table, as shown in Figure 1.11, includes summary information about the
FASTQ file analyzed with the FastQC. Basic Statistics title is always green and does not
show a warning sign. The information in the basic statistics includes the file name, file type,
encoding, total sequences (number of reads), sequence flagged, sequence length, and %GC
content. “Filename” field indicates the name of the FASTQ file analyzed. “File type” field
indicates whether the analyzed file was generated by the conventional base calling or by
another means. “Encoding” field indicates the ASCII encoding for the Phred quality score.
Since the encoding in the table is “Sanger / Illumina 1.9”, the Phred+33 encoding (Q33)
was used to encode the per base quality scores (as described above). “Total sequences”
field shows the number of records in the FASTQ file. “Filtered Sequences” field shows the
number of removed reads if the FASTQ file is in Casava format (Casava FASTQ file will
be discussed in Chapter 7). “Sequence Length” field shows either the length of the shortest
and longest sequence if the length of the reads is variable or a single value if all reads in
the FASTQ file have the same length. The “%GC” field shows the overall percentage of GC
content of all bases in all reads.
We should pay attention to “Total Sequences” if the analyzed FASTQ files are paired
end; the number of sequences must be the same in both files (forward and reverse files)
as shown in Figure 1.12. Otherwise, some programs used in the analysis may complain
because they expect matched pairs and in the same order in both files.
We should also pay attention to “Sequence length”. A range of numbers as shown in
Figure 1.13 indicate that the lengths of the sequences in the file are not equal and they can
be any length in the range between the two numbers. Some programs used in the analysis
may expect that all reads in a FASTQ file have equal length.
FIGURE 1.12 The basic statistics of the QC report.